Spotify Analytics

Python coding
Author

Ryan Horn

Published

March 31, 2025

This code displays all the tracks of one of my favorite artists (Oasis) in this data frame.

oasis_df = (
    spotify
    [spotify['artist_name'] == 'Oasis']
    .drop_duplicates(subset='track_name')
)
show(oasis_df)
pid playlist_name pos artist_name track_name duration_ms album_name
Loading ITables v2.2.4 from the init_notebook_mode cell... (need help?)

This code displays the total number of tracks by Oasis

(
    oasis_df
    .value_counts(['track_name']).shape[0]
)
12

This code changes the duration from ms to minutes to make it more understandable then prints the top 5 longest songs by Oasis

oasis_df['duration_sec'] = oasis_df['duration_ms'] / 1000
oasis_df['duration_min'] = oasis_df['duration_sec'] / 60

sorted_oasis = (
    oasis_df
    .sort_values(by='duration_min', ascending=False)
    [['track_name', 'duration_min']]
    .head(5)
)

show(sorted_oasis)
track_name duration_min
Loading ITables v2.2.4 from the init_notebook_mode cell... (need help?)

Top 10 Artists by Track Count & POS Distributions

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the full Spotify dataset
spotify = pd.read_csv('https://bcdanl.github.io/data/spotify_all.csv')

# 1. Find the top 10 most‐prolific artists
top10 = spotify['artist_name'].value_counts().nlargest(10).index.tolist()

# 2. Subset and plot the POS distributions with violins
df_top10 = spotify[spotify['artist_name'].isin(top10)]
plt.figure(figsize=(12, 6))
sns.violinplot(
    x='artist_name',
    y='pos',
    data=df_top10,
    cut=0,
    inner='quartile',
    palette='tab10'
)
plt.xticks(rotation=45, ha='right')
plt.title('POS Distribution for Top 10 Artists')
plt.xlabel('Artist')
plt.ylabel('Track Position (pos)')
plt.tight_layout()
plt.show()
/var/folders/gp/qrzfglvs0plg_zk9wtgwkzyr0000gn/T/ipykernel_75043/340212433.py:14: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.violinplot(

Interpretation:
Among the ten busiest artists, some—like the #1 artist—show very tight POS violins centered in the top‐20 (“early” positions), indicating consistent front‐loaded playlist placement. Others (e.g. the #4 and #7 artists) have much wider violins stretching into the bottom half, meaning their tracks appear across both early and late positions. Most artists fall between these extremes, with moderate variation in where their songs land.